When there’s a high enough chance someone won’t give in even if you use the horrible strategy, it’s not cost-effective to do the horrible strategy. Or even if you model doing the horrible strategy on everyone who realizes ever always, it is made cost-ineffective by group rejection. So if you’re an AI choosing between winning strategies, and the horrible strategy turns out to be a net loss in your models, you won’t do it. Therefore, stand strong! :P
I feel like this is something like burning a blackmail letter and then pretending to never have read it. If I know that the person in question has at some time understood the blackmail letter, but then deliberately burnt it and tried to forget about it, I will still impose the punishment.
Why should a “horrible strategy” (which might not even be so very horrible, being punished is probably still better than an FAI not existing) be a net loss? Even if you don’t understand the blackmailing or refuse to give in, FAI development is still a very important thing to do, and if you accept it and act accordingly, that’s beneficial to the FAI and therefore for all humans. So that’s 0 loss against a slight gain.
Also, I wouldn’t even want to refuse the blackmailing, because accepting it will be beneficial to humanity (in case I act accordingly, that is).
I don’t know what you mean by “group rejection”, a search didn’t bring up any results.
The blackmail letter hasn’t been sent, though, when you think about it. Until the AI actually is created and chooses a strategy, bargaining doesn’t stop. But it’s true that you have to be willing to counterfactually burn blackmail letters if you want to not receive them in the first place.
It’s not 0 loss for the AI. Especially if it’s friendly, doing horrible things to people goes against its other goals. It would like it better if you were happy, and besides doing horrible things takes resources, so there is definitely a cost.
“Group rejection” just meant “nearly everyone doesn’t give in.”
And yeah, if you want to be blackmailed, go for it :P
The blackmail letter hasn’t been sent, though, when you think about it.
Again, I am not convinced by this argument. I’d argue that the blackmail letter has been received, in some way, analogous to a normal blackmail letter, if you think about that an FAI might do this to make itself happen faster. If you realize that it might punish you in some way if this prospect had the chance of altering your actions, you have opened and read the letter. So the only possibility how you can resist the blackmailing is not to think about it in the first place. Once you have thought about it, however, I really don’t see what one could do.
Imagine I have a pill which turns me into a mindless zombie (not the philosophical one) for 3 hours. Taking this pill also leaves you with amnesia after I’m back to normal. I know that society punishes killing people. If I now deliberately take that pill (analogous to deliberately refusing the blackmailing without good reasons other than “I don’t want to get blackmailed”), can I expect punishment?
This also has a good application in reality: Drinking is not illegal. Committing crimes is. But in Germany, if one drinks oneself beyond a certain point (in a drunken stupor), one is considered not criminally responsible for ones actions. However, one can be punished for drinking oneself to this point.
Likewise, if you deliberately try to forget about that blackmail once you have thought about it, the future FAI might consider the deliberate act of forgetting punishworthy.
It’s not 0 loss for the AI. Especially if it’s friendly, doing horrible things to people goes against its other goals. It would like it better if you were happy, and besides doing horrible things takes resources, so there is definitely a cost.
Consider that a punishment influences my actions in a way that where I was beforehand not quite likely to speed up FAI development by 1 minute by doing X, creating Y extra utility, and after considering the blackmailing, I am much more likely to do X. How large is the punishment that the FAI may impose on me without becoming unfriendly? It’s greater than zero, because if the AI, by punishing me with Y-1 utility (or threatening to punish me, that is), gains an expected utility of Y that it would otherwise not gain, it will definitely threaten to punish me. Note that the things the FAI might do to someone are far from being horrible, post singularity might just be a little less fun, but enough that I’d prefer doing X.
If nearly everyone doesn’t give in after thinking about it, then indeed the FAI will only punish those who were in some way influenced by the punishment, although “deliberately not giving in merely because one doesn’t want to be blackmailed” is kind of impossible, see above.
And yeah, if you want to be blackmailed, go for it :P
I have to assume that this (speeding up FAI development) is best in any case.
I’d argue that the blackmail letter has been received, in some way, analogous to a normal blackmail letter, if you think about that an FAI might do this to make itself happen faster
You are simply mistaken. The analogy to blackmail may be misleading you—maybe try thinking about it without analogy. You might also read up on the subject, for example by reading Eliezer’s TDT paper
I’d like to see other opinions on this because I don’t see that we are proceeding any further.
I now read important parts of the TDT paper (more than just the abstract) and would say I understood at least those parts, though I don’t see anything that would contradict my considerations. I’m sorry, but I’m still not convinced. The analogies serve as a way to make the problem better graspable to intuition, but initially I thoguht about this without such analogies. I still don’t get where my reasoning is flawed. Could you try different approaches?
Hm. Actually, if you think about the following game, where A is the AI and B is the human:
~A1~A2 ~Bx~+9,-1+10,-1 ~By~ −1,-10+0,+0
The Nash equilibrium of the game is A2,By—that is, not horrible and doesn’t give in.
But if we have two agents facing off that don’t make moves independently, but instead choose winning strategies, there are multiple equilibria. I should really read Strategy of Conflict. The initiative to choose a particular equilibrium, however, is ours for the taking, for obvious temporal reasons. If we choose one of the equilibrium strategies, we dictate the other equilibrium strategy to the AI.
You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.
When there’s a high enough chance someone won’t give in even if you use the horrible strategy, it’s not cost-effective to do the horrible strategy. Or even if you model doing the horrible strategy on everyone who realizes ever always, it is made cost-ineffective by group rejection. So if you’re an AI choosing between winning strategies, and the horrible strategy turns out to be a net loss in your models, you won’t do it. Therefore, stand strong! :P
I feel like this is something like burning a blackmail letter and then pretending to never have read it. If I know that the person in question has at some time understood the blackmail letter, but then deliberately burnt it and tried to forget about it, I will still impose the punishment.
Why should a “horrible strategy” (which might not even be so very horrible, being punished is probably still better than an FAI not existing) be a net loss? Even if you don’t understand the blackmailing or refuse to give in, FAI development is still a very important thing to do, and if you accept it and act accordingly, that’s beneficial to the FAI and therefore for all humans. So that’s 0 loss against a slight gain.
Also, I wouldn’t even want to refuse the blackmailing, because accepting it will be beneficial to humanity (in case I act accordingly, that is).
I don’t know what you mean by “group rejection”, a search didn’t bring up any results.
The blackmail letter hasn’t been sent, though, when you think about it. Until the AI actually is created and chooses a strategy, bargaining doesn’t stop. But it’s true that you have to be willing to counterfactually burn blackmail letters if you want to not receive them in the first place.
It’s not 0 loss for the AI. Especially if it’s friendly, doing horrible things to people goes against its other goals. It would like it better if you were happy, and besides doing horrible things takes resources, so there is definitely a cost.
“Group rejection” just meant “nearly everyone doesn’t give in.”
And yeah, if you want to be blackmailed, go for it :P
Again, I am not convinced by this argument. I’d argue that the blackmail letter has been received, in some way, analogous to a normal blackmail letter, if you think about that an FAI might do this to make itself happen faster. If you realize that it might punish you in some way if this prospect had the chance of altering your actions, you have opened and read the letter. So the only possibility how you can resist the blackmailing is not to think about it in the first place. Once you have thought about it, however, I really don’t see what one could do.
Imagine I have a pill which turns me into a mindless zombie (not the philosophical one) for 3 hours. Taking this pill also leaves you with amnesia after I’m back to normal. I know that society punishes killing people. If I now deliberately take that pill (analogous to deliberately refusing the blackmailing without good reasons other than “I don’t want to get blackmailed”), can I expect punishment?
This also has a good application in reality: Drinking is not illegal. Committing crimes is. But in Germany, if one drinks oneself beyond a certain point (in a drunken stupor), one is considered not criminally responsible for ones actions. However, one can be punished for drinking oneself to this point.
Likewise, if you deliberately try to forget about that blackmail once you have thought about it, the future FAI might consider the deliberate act of forgetting punishworthy.
Consider that a punishment influences my actions in a way that where I was beforehand not quite likely to speed up FAI development by 1 minute by doing X, creating Y extra utility, and after considering the blackmailing, I am much more likely to do X. How large is the punishment that the FAI may impose on me without becoming unfriendly? It’s greater than zero, because if the AI, by punishing me with Y-1 utility (or threatening to punish me, that is), gains an expected utility of Y that it would otherwise not gain, it will definitely threaten to punish me. Note that the things the FAI might do to someone are far from being horrible, post singularity might just be a little less fun, but enough that I’d prefer doing X.
If nearly everyone doesn’t give in after thinking about it, then indeed the FAI will only punish those who were in some way influenced by the punishment, although “deliberately not giving in merely because one doesn’t want to be blackmailed” is kind of impossible, see above.
You are simply mistaken. The analogy to blackmail may be misleading you—maybe try thinking about it without analogy. You might also read up on the subject, for example by reading Eliezer’s TDT paper
I’d like to see other opinions on this because I don’t see that we are proceeding any further.
I now read important parts of the TDT paper (more than just the abstract) and would say I understood at least those parts, though I don’t see anything that would contradict my considerations. I’m sorry, but I’m still not convinced. The analogies serve as a way to make the problem better graspable to intuition, but initially I thoguht about this without such analogies. I still don’t get where my reasoning is flawed. Could you try different approaches?
Hm. Actually, if you think about the following game, where A is the AI and B is the human:
~A1~A2
~Bx~+9,-1+10,-1
~By~ −1,-10+0,+0
The Nash equilibrium of the game is A2,By—that is, not horrible and doesn’t give in.
But if we have two agents facing off that don’t make moves independently, but instead choose winning strategies, there are multiple equilibria. I should really read Strategy of Conflict. The initiative to choose a particular equilibrium, however, is ours for the taking, for obvious temporal reasons. If we choose one of the equilibrium strategies, we dictate the other equilibrium strategy to the AI.
You are probably correct—if it’s possible to plausibly precommit oneself to under no circumstances be influenced by any type of blackmailing, then and only then doesn’t it make sense for the AI to threaten to punish people, that is, then an AI which punishes non-helping persons who precommited themselves to helping under no circumstances is unlikely. The problem is that precommitting oneself to under no circumstances helping might be very hard: An AI will still assign a probability greater than zero to the possibility that I can be influenced by the blackmailing and the more this probability approaches zero, the more will the amount of expected utility for the AI in case that it manages to convince me converge to Y, which means that the punishment I have to expect in case I don’t help will also converge to Y.
But wait! As the probability that I’m influenced by the AI shrinks, the probability that it without any good incentive imposes a punishment converging to Y grows, but since we consider a friendly AI, this will also impose a negative expected utility converging to Y on the AI. This should mean that the expected punishment should shrink much faster the more the probability of my actions being influenced by the AI, as rated by the AI, goes down. Thusly, the more convincingly I state that I won’t be influenced by any kind of blackmailing, the more rapidly will the expected punishment shrink, effectively becoming a minor inconvenience or less, since the AI will also assume that I’ll try to avoid punishment and thusly regulate the probability of me being influenced even further down.
I hope I’m correct at this one. If so, thank you, Manfred.
I hereby state that I will under no circumstances be influenced by any kind of acausal threatening from any kind of future Artificial Intelligence, which means that imposing any kind of punishment upon me is rendered completely useless.
The line of reasoning I followed before coming to this insight was flawed, and I dearly have to apologize for wasting your precious computing power, probable future AI, and your time and effort, Manfred, Mitchell Porter, Multiheaded.
Do I win? I hope I do.
Attempt to summarize: a FAI would likely be too nice to bother with not-perfectly-effective torture, and I suspect that it’s a big ol’ softie, so I’m not feeling threatened much, making it even less effective, and so on in a positive feedback loop.